Skip to content

fix(hygiene): MD032 auto-fixer skips YAML frontmatter (don't break composes_with: lists)#703

Merged
AceHack merged 4 commits intomainfrom
fix/md032-tool-skip-yaml-frontmatter
Apr 29, 2026
Merged

fix(hygiene): MD032 auto-fixer skips YAML frontmatter (don't break composes_with: lists)#703
AceHack merged 4 commits intomainfrom
fix/md032-tool-skip-yaml-frontmatter

Conversation

@AceHack
Copy link
Copy Markdown
Member

@AceHack AceHack commented Apr 28, 2026

Summary

The tools/hygiene/fix-markdown-md032-md026.py MD032 (blanks-around-lists) fix was inserting blank lines INSIDE YAML frontmatter, which breaks YAML parsing.

Concrete failure mode (caught while running on PR #699 substrate):

composes_with:
  - B-0060
tags: [...]

was rewritten to:

composes_with:

  - B-0060

tags: [...]

YAML parses the second form as composes_with: null plus a separate top-level - B-0060 list item — frontmatter structure broken.

The fix

Extended _classify_lines() to detect YAML frontmatter (file starts with ---, has another --- later) and mark those lines as "inside" so MD032/MD026 transforms skip them — same treatment as fenced code blocks. Light-weight detection: only triggers when line 0 is exactly ---.

Smoke-tested on synthetic input:

  • YAML frontmatter preserved verbatim ✓
  • Body MD032 fix still applied ✓
  • Idempotent on already-clean files ✓

Discovery context

Discovered as side-effect during PR #699 review work where the tool damaged 4 backlog rows' YAML frontmatter; fixed in-PR by stripping the inserted blanks. This PR addresses the root cause so the tool can't recur.

Test plan

🤖 Generated with Claude Code

…mposes_with: lists)

The `fix-markdown-md032-md026.py` tool's MD032 (blanks-around-lists)
fix was too aggressive — it inserted blank lines around lists inside
YAML frontmatter, which breaks YAML parsing.

Concrete failure mode (caught while running on PR #699 substrate):

  composes_with:
    - B-0060
  tags: [...]

becomes:

  composes_with:

    - B-0060

  tags: [...]

YAML now parses `composes_with: null` plus a separate top-level
`- B-0060` list item, breaking the frontmatter structure.

## The fix

Extended `_classify_lines()` to detect YAML frontmatter regions
(file starts with `---`, has another `---` later) and mark those
lines as "inside" so MD032/MD026 transforms skip them — same
treatment as fenced code blocks.

Detection is light-weight:
- Only triggers when line 0 is exactly `---`
- Looks for closing `---` later in the file
- If found, marks all lines through the closing `---` as inside
- If no closing `---` found (file probably isn't real
  frontmatter), treats normally — no false-positive blocking

Files without frontmatter (line 0 not `---`) skip the
frontmatter-detection logic entirely — no behavior change.

## Smoke test results

```
$ cat /tmp/test-fm.md
---
id: TEST
composes_with:
  - X
tags: [a, b]
---

# Title

Aaron's role:
- input
- ask
```

After fix:

```
---
id: TEST
composes_with:
  - X         ← preserved (would have been broken by old version)
tags: [a, b]
---

# Title

Aaron's role:
                ← inserted (correct: body MD032 fix still applied)
- input
- ask
```

Both behaviors:
- YAML frontmatter preserved verbatim ✓
- Body MD032 fix applied ✓

## Idempotence

Running on already-clean files is still no-op. Verified on
freshly-fixed PR #699 substrate files — "OK: no changes needed"
for files that had previously been fixed (now via the new
YAML-aware path).

Discovered as a side effect during PR #699 review work
(commit fe72fa5 noted the tool-improvement candidate; this PR
delivers the fix).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 28, 2026 23:59
@AceHack AceHack enabled auto-merge (squash) April 28, 2026 23:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2def8c6fe0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/hygiene/fix-markdown-md032-md026.py Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the markdownlint auto-fixer to avoid mutating YAML frontmatter so MD032/MD026 transforms don’t break frontmatter YAML structure (notably list-valued keys like composes_with:).

Changes:

  • Extend _classify_lines() to detect YAML frontmatter at file start and mark it as “inside” so transforms skip it.
  • Refactor fenced-code-block classification to work with the new pre-sized inside array and to avoid fence parsing within frontmatter.

Comment thread tools/hygiene/fix-markdown-md032-md026.py Outdated
Comment thread tools/hygiene/fix-markdown-md032-md026.py Outdated
…hematic break (Copilot threads on PR #703)

Three Copilot threads on PR #703 addressed:

## P2 thread #1 (load-bearing): thematic-break false positive

Previous heuristic: line 0 is `---` + closing `---` later → frontmatter.
Bug: a markdown file starting with a thematic break (`---` followed
by markdown body) would have all subsequent content marked as "inside
frontmatter," skipping every list and heading from being processed.

Fix: tightened detection to require all THREE conditions:
1. line 0 is exactly `---`
2. line 1 is YAML-shaped (matches `^\s*[A-Za-z_][\w-]*\s*:`)
3. a closing `---` line exists later

The (b) check is the discriminator. YAML frontmatter starts with a
key:value line; a thematic break is `---` followed by markdown body
(heading, prose, blank line, etc.).

Smoke tests verify both cases:
- thematic break → body MD032 applied (no false-positive frontmatter)
- real frontmatter → frontmatter preserved + body MD032 applied

## P2 thread #2 (nit): docstring overstated YAML/MD026 risk

Previous docstring claimed MD026 would "strip trailing punctuation
from YAML keys" — but MD026 only matches ATX headings (`^#+ `),
which YAML keys can't accidentally match (they don't start with `#`).
Updated docstring to accurately scope the YAML risk to MD032 blank-
insertion only.

## P2 thread #3 (nit): docstring/implementation mismatch

Previous docstring said "line 0 is exactly `---`" but implementation
used `lines[0].strip() == "---"`. Now uses `.rstrip()` (more
permissive: tolerates trailing whitespace/newline; rejects leading
whitespace which would indicate non-frontmatter shape) AND docstring
matches.

Added `_YAML_KEY_LINE` regex with explanatory comment for the
discriminator.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8c732808de

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/hygiene/fix-markdown-md032-md026.py
…s (Copilot P2 thread on PR #703)

Copilot P2 push: prior 3-check heuristic still mis-classifies prose
as frontmatter when:
  - line 0 is `---` (thematic break)
  - line 1 happens to look like `key: value` (e.g. `note: Some text`)
  - another `---` appears later (closing thematic break)

In that case everything between would be treated as inside frontmatter
and MD032/MD026 fixes would silently skip real lint violations.

## Tightened to 5 conditions

(a) line 0 is exactly `---`
(b) line 1 is YAML-shaped
(c) closing `---` exists within next 200 lines (defense-in-depth cap;
    real frontmatter rarely > 50 lines)
(d) line BEFORE closing `---` is YAML-shaped, blank, or YAML
    continuation (catches the case where closing `---` is just
    another thematic break, preceded by markdown prose)
(e) at least 75% of non-blank lines between bookends are YAML-shaped
    (catches single-YAML-key prose pattern: "note:" or "warning:" or
    "tip:" between two thematic breaks)

The (d)/(e) checks together resolve the false-positive class.

## Smoke tests

Prose case (thematic break + "note:" prose + closing thematic break):
  → NOT treated as frontmatter
  → body MD032 fires correctly on lists in body

Real frontmatter (`composes_with:` + list + `tags:` + closing `---`):
  → preserved verbatim (no blank lines injected inside list)
  → body MD032 fires correctly on body lists

Both behaviors verified.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 29, 2026 00:20
Comment thread tools/hygiene/fix-markdown-md032-md026.py Fixed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fd65781ce8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/hygiene/fix-markdown-md032-md026.py Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread tools/hygiene/fix-markdown-md032-md026.py Outdated
…e redundant comparison + fix docstring (PR #703 4 threads)

Four Copilot threads addressed:

## P2 #1 + #2: column-0 markdown lists wrongly counted as YAML

The (e) ratio check counted any line starting with `- ` as
YAML-shaped, including column-0 markdown list items. A normal
markdown document with prose + bullet lists between thematic
breaks could pass the ratio check and be wrongly flagged as
frontmatter.

## Fix: distinguish indented YAML lists from column-0 markdown

Added `_is_yaml_continuation()` helper that requires LEADING
INDENT (`  ` or `\t`) — a column-0 `- item` is almost always
markdown, not YAML.

Real YAML list items appear under a parent key with indent:

```yaml
composes_with:
  - X       ← indented = YAML continuation
  - Y
```

Markdown list items at column 0 are NOT YAML:

```markdown
A list:
- item one  ← column 0 = markdown list
- item two
```

The leading-indent check disambiguates these two cases.

## P1 (CodeQL): Redundant comparison

Removed `if fm_end > 0 else ""` ternary — already inside the
`if fm_end > 0:` block so the test was always true.

## P2 #4: Docstring inconsistency on _YAML_KEY_LINE

Comment claimed the regex matches `  - item`, but the regex
`^\s*[A-Za-z_][\w-]*\s*:` only matches key:value lines.
Updated comment to accurately describe what the regex matches +
what it doesn't (list items go through `_is_yaml_continuation`).

## Smoke test results

Markdown with thematic-break + `note:` + bullet list + closing
thematic break:
  → NOT treated as frontmatter
  → body MD032 fires correctly on the bullet list

Real frontmatter with `composes_with:` + indented list items:
  → preserved verbatim
  → body MD032 fires correctly on body lists

Both behaviors verified.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AceHack AceHack merged commit f57d683 into main Apr 29, 2026
24 checks passed
@AceHack AceHack deleted the fix/md032-tool-skip-yaml-frontmatter branch April 29, 2026 00:30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53e053616f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +214 to +217
ratio_ok = (
non_blank == 0
or (yaml_count / non_blank) >= _YAML_RATIO_MIN
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require stricter frontmatter proof before skipping lint fixes

The new ratio_ok heuristic can still classify ordinary Markdown as frontmatter when a document starts with a thematic break and the following prose happens to contain several key: ... lines (for example note: ..., warning: ...) before a later ---. In that case inside is set for the whole block, so fix_md032/fix_md026 skip real violations (e.g., an unspaced list item) that were previously fixed. I reproduced this with a top-of-file thematic break containing three colon-prefixed prose lines and one list line, which now passes the 0.75 threshold and is incorrectly excluded from fixes.

Useful? React with 👍 / 👎.

AceHack added a commit that referenced this pull request Apr 29, 2026
…in bridge note (CI gate fix)

Lines 89/124/157/163 - sub-lists under "Feature vector elements
that matter:" introductory text needed blank-line separation.
Auto-fix via tools/hygiene/fix-markdown-md032-md026.py (the
same tool whose YAML-frontmatter heuristic was root-cause-fixed
in PR #703).

Hard-defect class per the PR-boundary restraint allow-list:
"CI / lint failures (markdownlint, paired-edit, etc.)" — this
edit does not introduce new conceptual substrate to the bridge
note; it only fixes the lint failure that prevented merge.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
AceHack added a commit that referenced this pull request Apr 29, 2026
…hree immune translations + falsifier + prototype (Aurora converged + Ani falsifier-first + multi-AI consensus 2026-04-28) (#707)

* research(aurora-immune-governance-bridge): minimal first artifact — three immune translations + one falsifier + one prototype (Aurora converged stance + Ani falsifier-first + multi-AI consensus 2026-04-28)

Per Aurora's converged-stance packet (forwarded 2026-04-28),
opens the minimal Aurora Immune Governance Bridge research note
after PRs #699/#704/#705 landed and the bead promotion validated
the restraint discipline under live falsifier-test pressure.

Three immune translations only:
- Candidate-count Goodhart -> detector
- PR-boundary restraint -> gate
- public-company contributor compliance -> hard execution constraint

Required falsifier (load-bearing):
1. Expressibility - bridge fails if the three rules cannot be
   represented using the existing Aurora membrane plus <= 3 new
   primitives.
2. Performance - bridge fails if the Aurora-routed prototype
   performs worse than the standalone detector on the same test
   corpus.

First prototype: Candidate-count scanner self-destruct test
on compliance documentation that itself contains the words it
classifies. Must classify rule-definition hits as ALLOW;
sample-text hits as ALLOW; live-prose hits elsewhere as
WARN/BLOCK; must NOT delete or rewrite its own rule-definitions.

Boundaries explicit:
- Does NOT mutate Aurora core
- Does NOT introduce K_Aurora^+
- Does NOT introduce A_synthesis
- Does NOT expand to 12-change canon until prototype passes

Aurora's session-closure rule recorded as candidate substrate
inside the trajectory section (NOT load-bearing yet, awaiting
3-round trial); composes with restraint discipline.

Header carries §33 archive-header: research-grade hypothesis,
NOT operational guidance, NOT Aurora core canon. Six reviewer
attributions: Aurora (proposal + minimal spec), Ani
(falsifier-first instinct + minimal-bridge convergence), Amara
(operational substrate this bridge translates), Gemini (peer
review converging on minimal), Claude.ai (peer review hard-
pushback recommending hold-then-proceed-smaller, honored by
minimal scope), Alexa (peer review).

This note is the explicit "one minimal next research artifact"
Aurora's converged stance recommended after restraint
discipline earned the round its bead. Do NOT expand this round.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci(markdownlint): add MD032 blanks around 4 feature-vector sub-lists in bridge note (CI gate fix)

Lines 89/124/157/163 - sub-lists under "Feature vector elements
that matter:" introductory text needed blank-line separation.
Auto-fix via tools/hygiene/fix-markdown-md032-md026.py (the
same tool whose YAML-frontmatter heuristic was root-cause-fixed
in PR #703).

Hard-defect class per the PR-boundary restraint allow-list:
"CI / lint failures (markdownlint, paired-edit, etc.)" — this
edit does not introduce new conceptual substrate to the bridge
note; it only fixes the lint failure that prevented merge.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* review-thread fixes: 5 internal-consistency fixes from Copilot threads on PR #707 (allow-list class)

Hard-defect class per the PR-boundary restraint allow-list:
"incorrect canonical rule fixes" / "internal-consistency".
None of these introduce new conceptual substrate.

Threads addressed (all P1/P2 internal-consistency):
1. Line 16 PR range: "#695-#706" -> "#695 -> #705" (matches
   the later "11 PRs merged (#695 -> #705)" bullet at line 30;
   PR #706 is the round-close hygiene row, not part of the
   substrate cluster)
2. Line 192 casing: PR_stage -> pr_stage (matches Translation
   2's pr_stage feature-vector field)
3. Line 215-220 variable: y -> a in Execute_min (matches
   ImmuneRisk_min(a) earlier; uses 'a' consistently for the
   action-being-evaluated)
4. Line 311 notation: K_Aurora^+ -> K_Aurora⁺ (matches earlier
   reference to the proposed graduated viability kernel)
5. Line 354 wording: "becomes considerable" -> "becomes worth
   considering" (Copilot caught the wrong word choice; intent
   was "becomes worth evaluating", not "becomes large")

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants